Conversation
|
@abergeron how do you suggest to give the run time option to the user? Could this information be in the context and passed at context init? |
| DEF_PROC_V2(cuMemAlloc, (CUdeviceptr *dptr, size_t bytesize)); | ||
| DEF_PROC_V2(cuMemFree, (CUdeviceptr dptr)); | ||
| DEF_PROC_V2(cuMemAllocHost, (void **pp, size_t bytesize)); | ||
| DEF_PROC(cuMemAllocManaged, (CUdeviceptr* dptr, size_t bytesize, unsigned int flags)); |
There was a problem hiding this comment.
I tried to use DEF_PROC_V2 here and it failed. Why? How this work? I followed the code but didn't understood.
|
Since this doesn't seem to make a difference in speed (if you don't oversubscribe) I'm inclined to enable this by default. However @nouiz wants to have some sort of user warning if you actually exceed the memory available on the gpu to warn about potential slowdown. This could be a simple fprintf(), but I dislike chatty libraries for things that are rather common. |
|
I did only the timing for big size matrices on Pascal. I need to time for
small matrices in case it add overhead and test it on pre Pascal GPU.
…On Tue, Apr 11, 2017 at 4:53 PM abergeron ***@***.***> wrote:
Since this doesn't seem to make a difference in speed (if you don't
oversubscribe) I'm inclined to enable this by default.
However @nouiz <https://github.com/nouiz> wants to have some sort of user
warning if you actually exceed the memory available on the gpu to warn
about potential slowdown. This could be a simple fprintf(), but I dislike
chatty libraries for things that are rather common.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#401 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AALC-2ntIhUAaw1NsGPPdnOMTpGoIwKiks5ru-hbgaJpZM4M5mwQ>
.
|
|
I tested on an 750. The speed was about the same, but I wasn't able to allocate as much memory! So we can't activate this by default. |
|
@abergeron told that maybe we could use it automatically/always only for Pascal GPU. |
TODO: Make a user option of this.
I tested it with oversubsbriding on a DGX-1 and check_blas now pass with M,N,K == 30k while it wasn't before this PR. So it allow some oversubscribing.